Telephone speech recognition from large lists of Czech words

نویسنده

Jan Nouza

چکیده

In the paper we investigate methods suitable for practical implementation in a recognition system that is to classify telephone input in form of isolated words/phrases belonging to large vocabularies with equiprobable entries, such as people names, city and local names, etc. Specifically for Czech language we propose a pronunciation lexicon with a prefix-stem-sufix arrangement combined with appropriate caching and pruning techniques and a 2-level (monophone and triphone) based classification. In experiments done with telephone speech containing items from a 5347-word city-name vocabulary we obtained 90.1 % recognition score in average time 645 ms per word. Acoustic models for these experiments have been trained on an only available multi-speaker database that was originally recorded by a microphone and later transferred over telephone lines and automatically realigned.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Spoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting

Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...

متن کامل

Speech Recognition of Czech-Inclusion of Rare Words Helps

Large vocabulary continuous speech recognition of inflective languages, such as Czech, Russian or Serbo-Croatian, is heavily deteriorated by excessive out of vocabulary rate. In this paper, we tackle the problem of vocabulary selection, language modeling and pruning for inflective languages. We show that by explicit reduction of out of vocabulary rate we can achieve significant improvements in ...

متن کامل

Automatic Detection of Emphasized Words for Performance Enhancement of a Czech ASR System

This paper deals with a problem of prosodically emphasized word detection in Czech speech. The main goal is to propose an automatic emphasized word detection system that would be component of an Automatic speech recognition system (ASR) and would enrich its text output with highlighting emphasized words. The detection method is based on Czech prosodic rules and uses speech signal intensity, pit...

متن کامل

Recent work on a preselection module for a flexible large vocabulary speech recognition system in telephone environment

At ICSLP’96 we presented a flexible, large vocabulary, speaker independent, isolated-word preselection system in a telephone environment, using a two stage, bottom-up strategy [6]. We achieved reasonable performance in large and very large vocabulary tasks, ranging from 1200 to 10000 words. In this paper, we describe recent studies we have carried out on the system, aimed at two directions: han...

متن کامل

Challenges in Speech Processing of Slavic Languages (Case Studies in Speech Recognition of Czech and Slovak)

Slavic languages pose a big challenge for researchers dealing with speech technology. They exhibit a large degree of inflection, namely declension of nouns, pronouns and adjectives, and conjugation of verbs. This has a large impact on the size of lexical inventories in these languages, and significantly complicates the design of text-to-speech and, in particular, speech-to-text systems. In the ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2000

Telephone speech recognition from large lists of Czech words

نویسنده

چکیده

منابع مشابه

Spoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting

Speech Recognition of Czech-Inclusion of Rare Words Helps

Automatic Detection of Emphasized Words for Performance Enhancement of a Czech ASR System

Recent work on a preselection module for a flexible large vocabulary speech recognition system in telephone environment

Challenges in Speech Processing of Slavic Languages (Case Studies in Speech Recognition of Czech and Slovak)

عنوان ژورنال:

اشتراک گذاری